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We have performed multicanonical simulations of hydrophobic-hydrophilic heteropolymers with 
two simple effective, coarse-grained off-lattice models to study the influence of specific interactions 
in the models on conformational transitions of selected sequences with 20 monomers. Another 
aspect of the investigation was the comparison with the purely hydrophobic homopolymer and 
the study of general conformational properties induced by the "disorder" in the sequence of a 
heteropolymer. Furthermore, we applied an optimization algorithm to sequences with up to 55 
monomers and compared the global-energy minimum found with lowest-energy states identified 
within the multicanonical simulation. This was used to find out how reliable the multicanonical 
method samples the free-energy landscape, in particular for low temperatures. 
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I. INTRODUCTION 

The understanding of protein folding is one of the 
most challenging objectives in biochemically motivated 
research. Although the physical principles are known, the 
complexity of proteins as being macromolecules consist- 
ing of numerous atoms, the influence of quantum chemi- 
cal details on long-range interactions as well as the role of 
the solvent, etc., makes an accurate analysis of the folding 
process of realistic proteins extremely difficult. There- 
fore, one of the most important questions in this field is 
how much detailed information can be neglected to estab- 
lish effective, coarse-grained models yielding reasonable, 
at least qualitative, results that allow for, e.g., a more 
global view on the relationship between the sequence 
of amino acid residues and the existence of a global, 
funnel-like energy minimum in a rugged free-energy land- 
scape 0. 

Within the past two decades much work has been de- 
voted to introduce minimalistic models based on general 
principles that are believed to primarily control the struc- 
ture formation of proteins. One of the most prominent 
examples is the HP model of lattice proteins which 
has been exhaustively investigated without revealing all 
secrets, despite its simplicity. In this model, only two 
types of monomers are considered, with hydrophobic {H) 
and polar (P) character. Chains on the lattice are self- 
avoiding to account for the excluded volume. The only 
explicit interaction is between non-adjacent but next- 
neighbored hydrophobic monomers. This interaction of 
hydrophobic contacts is attractive to force the formation 
of a compact hydrophobic core which is screened from the 
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(hypothetic) aqueous environment by the polar residues. 
Statistical mechanics simulations of this model are still 
subject of studies requiring the application of sophisti- 
cated algorithms [1,^1^. 

A manifest off-lattice generalisation of the HP model 
is the AB model where the hydrophobic monomers 
are labelled by A and the polar or hydrophilic ones by 
B. The contact interaction is replaced by a distance- 
dependent Lennard-Jones type of potential accounting 
for short-range excluded volume repulsion and long-range 
interaction, the latter being attractive for AA and BB 
pairs and repulsive for AB pairs of monomers. An addi- 
tional interaction accounts for the bending energy of any 
pair of successive bonds. This model was first applied in 
two dimensions ^ and generalized to three-dimensional 
AB proteins |3, la| , partially with modifications taking 
implicitly into account additional torsional energy con- 
tributions of each bond. 

More knowledge-based coarse-grained models are often 
of Go type, where, in the simplest lattice formulation, 
the energy is proportional to the negative number of na- 
tive contacts. These models usually require the knowl- 
edge of the native conformation and serve, e.g., as models 
for studies of folding pathways 0, and native topol- 
ogy [iTl ll^ . There is also growing interest in modeling 
the folding behavior of single-domain proteins with sim- 
plified models as many of them show up a simple two- 
state kinetics without intermediary states that would 
slow down the folding dynamics ( "traps" ). It seems, how- 
ever, that native-centric models such as those of Go type 
require modifications for a qualitati vely correct descrip- 
tion of this sharp folding transition [ij, [13 . 

In this paper we report studies of thermodynamic and 
ground-state properties of AB sequences known from the 
literature 0, 0, 0| for two representations of the AB 
model l^l?! which are described in Sec.m While, com- 
pared to all-atom formulations, the interactions in these 
coarse-grained models are greatly simplified and hence 
can be computed much faster, they still preserve a com- 
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plicated rugged free-energy landscape, where naive sim- 
ulations would easily get trapped. In order to accurately 
resolve the low-temperature behavior we, therefore, ap- 
plied a multicanonical Monte Carlo algorithm fisl [l9l | 
with an appropriate update mechanism. For simulations 
of systems with complex free energy landscapes, multi- 
canonical sampling has become very popular and its ap- 
plication in protein simulations has a long tradition [20j . 
The ground-state search was achieved by means of the 
energy landscape paving minimizer (ELP) |2H . Sec- 
tion mil is devoted to the description of these methods. 
We present the results for the global energy minima and 
the thermodynamic quantities in Sec. IIVI The paper is 
concluded by the summary in Sec. 



II. EFFECTIVE OFF-LATTICE MODELS 

We investigated two effective off-lattice models of AB 
type for heteropolymers with TV monomers. The first one 
is the original AB model as proposed in Ref. 6] with the 
energy function 
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where the first sum runs over the {N — 2) angles < 
■iJfc < TT of successive bond vectors. This term is the 
bending energy and the coupling is "ferromagnetic", i.e., 
it costs energy to bend the chain. The second term par- 
tially competes with the bending barrier by a potential of 
Lennard- Jones type depending on the distance between 
monomers being non-adjacent along the chain. It also 
accounts for the infiuence of the AB sequence {at = A 
for hydrophobic and (7^ = B for hydrophilic monomers) 
on the energy of a conformation as its long-range behav- 
ior is attractive for pairs of like monomers and repulsive 
for AB pairs of monomers: 
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where is the bond vector between the monomers k 
and k + 1 with length unity. In Ref. |^ different values 
for the parameter set K2) were tested and finally set 
to (—1, 0.5) as this choice led to distributions for the 
angles between bond vectors and b^+i as well as the 
torsion angles between the surface vectors b^ x b^+i and 
bfe+i X bfe_|_2 that agreed best with distributions obtained 
for selected functional proteins. Since b^ -bfc+i = cost^fe, 
the choice ki — —I makes the coupling between suc- 
cessive bonds "antiferromagnetic" or "antibending" con- 
trary to what was chosen in Eq. ^ for AB model I. 
The second term in Eq. Q takes torsional interactions 
into account without being an energy associated with the 
pure torsional barriers in the usual sense. The third term 
contains now a pure Lennard-Jones potential, where the 
1 /r^j long-range interaction is attractive whatever types 
of monomers interact. The monomer-specific prefactor 
C// (cj , (Tj ) only controls the depth of the Lennard-Jones 
valley: 



+ 1, 
-1/2, 



CTi, fTj = A, 

ai,aj — B 



or CTj ^ Gj 



(4) 

For technical reasons, we have introduced in both mod- 
(1) els a cut-off rij = 0.5 for the Lennard-Jones potentials 
below which the potential is hard-core repulsive (i.e., the 
potential is infinite). For both models only a few results 
are given in the literature. For the first model these are 
estimates for the global energy minima of certain AB 



sequences |16( , while for AB model II primarily thermo- 
dynamic quantities were determined We have per- 
formed ELP optimizations '2l'| in order to find deeper- 
lying energy minima than the values quoted and, in par- 
ticular, multicanonical sampling |l8l ll9j for enabling us 
to focus on thermodynamic properties of the sequences 
given in Ref. 7J. 



III. METHODS 



Ci{ai,aj) = 




A, 
B, 



(2) 



We will refer to this model as AB model I throughout 
the paper. 

The other model we have studied has been introduced 
in Ref. 't'I and is a variant of AB model I in that it also 
consists of angular and distance-dependent energy terms, 
but with some substantial modifications. In the follow- 
ing, we denote it as AB model II. The energy is given 

by 
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In this section we describe the computational sampling 
methods and the update procedure we applied to obtain 
results for off-lattice AB heteropolymers. 



A. Energy-landscape paving optimization 
procedure 

In order to locate global energy minima of a complex 
system, it is often useful to apply specially biased al- 
gorithms that only serve this purpose. We used the 
energy landscape paving (ELP) minimizer p]| to find 
global energy minima of the sequences under consider- 
ation. The ELP minimization is a Monte Carlo opti- 
mization method, where the energy landscape is locally 
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flattened. This means that if a state x with energy E{x) 
is hit, the energy is increased by a "penalty" which itself 
depends on the histogram of any suitably chosen order 
parameter. The simplest choice is the energy distribution 
H{E) such that we define Et = Et + f{H{Et)). Thus, the 
Boltzmann probability exp{—Et/kBT) for a Metropolis 
update, where fcsT is the thermal energy at the tem- 
perature T, becomes a function of "time" t. The ad- 
vantage of this method is that local energy minima are 
filled up and the likelihood of touching again recently 
visited regions decreases. This method has successfully 
proved to be applicable to find global en erg y minima in 
rough energy landscapes of proteins 0,|23|. Of course, 
as a consequence of the biased sampling, stochastic meth- 
ods along the line of ELP violate detailed balance and it 
is therefore inappropriate to apply those for uncovering 
thermodynamic properties of a statistical system. 



B. Multicanonical method 

To obtain statistical results we applied a multicanon- 
ical Monte Carlo algorithm, where the energy distribu- 
tion is flattened artificially allowing, in principle, for a 
random walk of successive states in energy space. This 
fiattening is controllable and therefore reproducible. For 
this purpose, the Boltzmann probability is multiplied by 
a weight factor W{E), which in our case is a function 
of the energy. Then the probability for a state with en- 
ergy E reads pm{E) ~ cx.p{~E/kBT)W{E). In order 
to obtain a multicanonical or "flat" distribution, the ini- 
tially unknown weight function W{E) has to be chosen 
accordingly what is done iteratively: In the beginning, 
the weights H/'"' (E) are set to unity for all energies let- 
ting the flrst run be a usual Metropolis simulation which 
yields an estimate H'^^\E) for the canonical distribution. 
The histogram is used to determine the next guess for the 
weights, the simplest update is to calculate W'^^\E) — 
H'^°\E)/W^^\E). Then the next run is performed with 

probabilities = eyi.-p{-E /kBT)W'^^\E) of states 

with energy E. The iterative procedure is continued un- 
til the weights are appropriate in a way that the his- 
togram is "flat". The introduction of a flatness crite- 
rion, for instance, that the fluctuations around the aver- 
age value of the histogram are less than 20%, is use- 
ful, but not necessary, if the number of iterations is 
very large. After having determined accurate weights 
W{E)^ they are kept fixed and following some thermal- 
ization sweeps a long production run is performed, where 
statistical quantities O are obtained multicanonically, 
(0)m = E{x}m(£^({x}))0({x})/ZM with the muhi- 
canonical partition function Zm — S{x} ^'^^(-^({■''■}))- 
The canonical statistics is obtained by reweighting the 
multicanonical to the canonical distribution, i.e., mean 
values are computed as (O) = {OW~^) m / (W^^) m ■ 

For the determination of the multicanonical weights we 
performed 200 iterations with at least 10^ sweeps each. 




FIG. 1: Spherical update of the bond vector between the ith 
and (i 4- l)th monomer. 



The histograms obtained in the iteration runs were ac- 
cumulated crror-weig hted Tg*! such that the estimation 
of the weights was based on increasing statistics and not 
only on the relatively small number of sweeps per run. 
In the production period, 5 x 10^ sweeps were generated 
to have reasonable statistics for estimating the thermo- 
dynamic quantities. Statistical errors are estimated with 
the standard Jackknife technique [S^, I3 • 



C. Spherical update mechanism 

For updating a conformation we use the procedure dis- 
played in Fig. n Since the length of the bonds is fixed 
{\hk\ = 1, fc = 1, . . . , TV— 1), the (z-l-l)th monomer lies on 
the surface of a sphere with radius unity around the ith 
monomer. Therefore, spherical coordinates are the natu- 
ral choice for calculating the new position of the (i-l- l)th 
monomer on this sphere. For the reason of efficiency, we 
do not select any point on the sphere but restrict the 
choice to a spherical cap with maximum opening angle 
20max (the dark area in Fig. Thus, to change the 
position of the {i + l)th monomer to (z + 1)', we select 
the angles 9 and if randomly from the respective intervals 
cos ^max < COS 9 < 1 and < </3 < 27r, which ensure a uni- 
form distribution of the (« -I- l)th monomer positions on 
the associated spherical cap. After updating the position 
of the {i -f l)th monomer, the following monomers in the 
chain are simply translated according to the correspond- 
ing bond vectors which remain unchanged in this type 
of update. Only the bond vector between the ith and 
the (z -f l)th monomers is rotated, all others keep their 
direction. This is similar to single spin updates in local- 
update Monte Carlo simulations of the classical Heisen- 
berg model with the difference that in addition to local 
energy changes long-range interactions of the monomers, 
changing their relative position to each other, have to 
be computed anew after the update. In our simulations 
of the AB models we used a very small opening angle, 
cos6'inax — 0.99, in order to be able to sample also very 
narrow and deep valleys in the landscape of angles. 
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TABLE L The four Fibonacci sequences, for which we analyzed the ground states in both models. 

sequence 

13 AB2AB2ABAB2AB 

21 BABAB2ABAB2AB2ABAB2AB 

34 AB2AB2ABAB2AB2ABAB2ABAB2AB2ABAB2AB 

55 BABAB2ABAB2AB2ABAB2ABAB2AB2ABAB2AB2ABAB2ABAB2AB2ABAB2AB 



TABLE II: Estimates for global energy minima obtained with multicanonical (MUCA) sampling and ELP minimization |2lll 
for the Fibonacci sequences of Table |T| using both models. The values for AB model I are compared with results quoted in 
Ref. employing off-lattice PERM and after subsequent conjugate-gradient minimization (PERM-I-). Minimum energies 
found with MUCA and ELP for AB model II are compared with the lowest energies listed in Ref. QJl, obtained with annealing 
contour Monte Carlo (ACMC) and improved by Metropolis quenching (ACMC-f). 
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AB model II 
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13 


-4.967 


-4.967 


-3.973 


-4.962 


-26.496 


-26.498 


-26.363 


-26.507 


21 


-12.296 


-12.316 


-7.686 


-11.524 


-52.915 


-52.917 


-50.860 


-51.718 


34 


-25.321 


-25.476 


-12.860 


-21.568 


-97.273 


-97.261 


-92.746 


-94.043 


55 


-41.502 


-42.428 


-20.107 


-32.884 


-169.654 


-172.696 


-149.481 


-154.505 



TABLE III: Root mean square deviations rmsd and overlap 
Q of lowest-energy conformations found with multicanonical 
sampling and with the ELP optimizer for the Fibonacci se- 
quences of length given in Table Q 





AB model I 


AB model II 


N 


rmsd 





rmsd 


Q 


13 


0.015 


0.994 


0.006 


0.998 


21 


0.025 


0.992 


0.009 


0.997 


34 


0.162 


0.979 


1.412 


0.840 


55 


2.271 


0.766 


1.904 


0.857 



IV. RESULTS 

We applied the multicanonical algorithm primarily to 
study thermodynamic properties, e.g., the "phase" be- 
havior of off-lattice heteropolymers. Before discussing 
these results, however, we analyze the capability of mul- 
ticanonical sampling to find lowest-energy conformations, 
in particular the native fold, because the identification of 
these structures is not only interesting as a by-product 
of the simulation. Rather, since they dominate the low- 
temperature behavior, it is necessary that they are gen- 
erated frequently in the multicanonical sampling. 



A. Search for global energy minima 

In order to be able to investigate the low-temperature 
behavior of AB off-lattice heteropolymers, we first have 
to assess the capabilities of the multicanonical method to 
sample lowest-energy conformations and, in particular, to 
approach closely the global energy minimum conforma- 



tion. In Table d we list the Fibonacci sequences p that 
have already been studied in Ref. by means of an 
off-lattice variant of the improved version 0| of the cele- 
brated chain-growth algorithm with population control, 
PERM 25]. In Ref. 16], first estimates for the putative 
ground-state energies of the Fibonacci sequences with 13 
to 55 monomers were given for AB model I in three 
dimensions. We compare these results with the respec- 
tive lowest energies found in our multicanonical simula- 
tions and with the results obtained with the minimiza- 
tion algorithm ELP |^. It turns out that the ground- 
state energies found by multicanonical sampling agree 
well with what comes out by the ELP minimizer, cf. Ta- 
ble ^ Another interesting result is that our estimates 
for the ground-state energies lie significantly below the 
energies quoted in Ref. obtained with the off-lattice 
PERM variant, and our values are even lower than the 
energies obtained by PERM and subsequent conjugate- 
gradient minimization in the attraction basin. 

We have also performed this test with model II and 
we can compare our results with minimum energies listed 
in Ref. ^3] j where the so-called annealing contour Monte 
Carlo (ACMC) method was applied to these Fibonacci 
sequences. From Tablellllwe see that we find with MUCA 
and ELP runs lower energies for the sequences with 21, 
34, and 55 monomers, while the results for the 13mer are 
comparable. 

Note that the multicanonical algorithm is not tuned to 
give good results in the low-energy sector only. For all 
sequences studied in this work, the same algorithm also 
yielded the thermodynamic results to be discussed in the 
following section. 

In order to check the structural similari- 
ties of the lowest-energy conformations ob- 
tained with multicanonical sampling and those 
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FIG. 2: Side (left) and top view (right) of the global energy 
minimum conformation of the 55mer (AB model I) found with 
the ELP minimization algorithm (dark spheres: hydrophobic 
monomers - A, light spheres: hydrophilic - B). 



from the ELP optimization runs, X 
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Here, — ~ denotes 

the position with respect to the respective centers of mass 
Xq — ^j/-^ of the zth monomer of the lowest-energy 

conformations found with multicanonical sampling and 
ELP minimization, respectively. Obviously, the rmsd is 
zero for exactly coinciding conformations and the larger 
the value the worse the coincidence. The minimization of 
the sum in Eq. lO is performed with respect to a global 
relative rotation of the two conformations in order to find 
the best match. For the explicit calculation we used the 
exact quaternion-based optimization procedure described 
in Ref. 

As an alternative, we introduce here another parame- 
ter that enables us to compare conformations. Instead of 
performing comparisons of positions it is much simpler 
and less time-consuming to calculate the so-called over- 
lap between two conformations by comparing their bond 
and torsion angles. As an extension of the torsion-angle 
based variant |28| , we define the more general overlap 
parameter 



Q(X,X') 



Nt + m^ d(X, XQ 

Nt + Nb 



(6) 



where (with Nt = iV — 3 and Ni, = N — 2 being the 
numbers of torsional angles $i and bond angles Qi = 



TABLE IV: Sequences used in the study of thermodynamic 
properties of heteropolymers as introduced in Ref. 7J. The 
number of hydrophobic monomers is denoted by #A. 



No. 



sequence 



#A 



20.1 BA6BA4BA2BA2B2 14 

20.2 BA2BA4BABA2BA5B 14 

20.3 A4B2A4BA2BA3B2A 14 

20.4 A4BA2BABA2B2A3BA2 14 

20.5 BA2B2A3B3ABABA2BAB 10 

20.6 A3B2AB2ABAB2ABABABA 10 



IT — di, respectively) 



1 

d(X,X') = - K]d,($,,,<l,^) 



E 

1=1 



dt(<i>„$:) = min(|$,-$^|,27r-|<i>,-$^|), 

4(9,, e^) = |e, -e^|. 

Since — tt < < it and < 8^ < tt it follows immedi- 
ately that < dt^b < TT. The overlap is unity, if all angles 
of the conformations X and X' coincide, else < Q < 1. 

Note that both models are energetically invariant un- 
der reflection symmetry. This means that also the land- 
scape of the free energy as function of the bond and 
torsion angles is trivially symmetric with respect to the 
torsional degrees of freedom. Consequently, unless ex- 
ceptional cases, there is thus a trivial twofold energetic 
degeneracy of the global energy minimum, but the re- 
spective rmsd and overlap parameter are different for the 
associated conformations. The values quoted throughout 
the paper were obtained by comparing the lowest-energy 
conformation found with both reference conformations 
and quoting the value indicating better coincidence (i.e., 
lower rmsd and higher overlap). This obvious ambigu- 
ity can be circumvented by adding a symmetry-breaking 
term to the models that disfavors, e.g., left-handed helic- 
ity [H. 

In Table IIIII we list the values of both parameters for 
the lowest-energy conformations found for the Fibonacci 
sequences of Table 13 We see that for the shortest se- 
quences (with 13 and 21 monomers) the coincidence of 
the lowest-energy structures found is extremely good and 
we are pretty sure that we have found the ground states. 
In the case of the 34mer, modelled with AB model I, 
both structures also coincide still very well and it seems 
that the attraction valley towards the ground state was 
found within the multicanonical simulation. In the sim- 
ulation of the 34mer with model II the situation is dif- 
ferent. As seen from Table Ull we found surprisingly the 
marginally lower energy value in the multicanonical sim- 
ulation, but the associated conformation differs signif- 
icantly from that identified with ELP. It is likely that 
both conformations do not belong to the same attraction 
basin and it is a future task to reveal whether this is a first 
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TABLE V: Minimal energies and temperatures of the maxi- 
mum specific treats for the six 20mers of Table HVl usins AB 
model f as obtained by multicanonical sampling. The global 
maximum of the respective specific heats is indicated by a star 
(★). For comparison, we have also given the globally minimal 
energies found from minimization with ELP as well as the re- 
spective rmsd and the structural overlap parameter Q of the 
corresponding minimum energy conformations. 



No. E, 



MUCA 



rmsd Q 



(1) 



T, 



(2) 



20.1 -33.766 -33.810 0.048 0.954 0.27(3)* 0.61(5) 

20.2 -33.920 -33.926 0.015 0.992 0.26(4)* 0.69(4) 

20.3 -33.582 -33.578 0.025 0.990 0.25(3)* 0.69(3) 

20.4 -34.496 -34.498 0.030 0.985 0.26(3) 0.66(2)* 

20.5 -19.647 -19.653 0.017 0.988 0.15(2) 0.41(1)* 

20.6 -19.322 -19.326 0.047 0.989 0.15(2)° 0.35(1)* 

"Specific fieat of sequence 20.6 possesses only one maximum at 



7.(2) 



0.35. Tfie value given for 
turning point. 



(1) 



belongs to the pronounced 



indication of metastability. The situation is even more 
complex for the 55mer. As the parameters tell us, the 
lowest-energy conformations identified by multicanonical 
simulation and ELP minimization show significant struc- 
tural differences in both models. 

Answering the question of metastability is strongly re- 
lated with the problem of identifying the folding path or 
an appropriate parameterization of the free energy land- 
scape. Due to hidden barriers it is practically impossible 
that ordinary stand-alone multicanonical sampling will 
be able to achieve this for such relatively long sequences. 
Note that, in our model I multicanonical simulation for 
the 55mer, we precisely sampled the density of states 
over 120 orders of magnitude, i.e., the probability for 
finding randomly the lowest-energy conformation (that 
we identified with multicanonical sampling) in the con- 
formational space is even less than 10" ^^^l In Fig. [3 we 
show two views of the global energy minimum confor- 
mation with energy i?min ~ —42.4 for the 55mer found 
by applying the ELP minimizer to AB model L The hy- 
drophobic core is tube-like and the chain forms a helical 
structure (which is here an intrinsic property of the model 
and is not due to hydrogen-bonding obviously being not 
supplied by the model). 



B. Thermodynamic properties of heteropolymers 

Our primary interest of this study is focussed on ther- 
modynamic properties of heteropolymers, in particular 
on conformational transitions heteropolymers pass from 
random coils to native conformations with compact hy- 
drophobic core. We investigated six heteropolymers with 
20 monomers as studied by Irback et al. in Ref. with 
AB model IL The associated sequences are listed in Ta- 
ble llVI Notice that the hydrophobicity (= monomers 
in the sequence) is identical (= 14) for the first four se- 



TABLE VL Same as Table El but using AB model IL For 
comparison the specific heat maximum locations Ts, esti- 
mated in Ref. are also given. 



No. E, 



MUCA 



E„ 



rmsd Q Tc [7] 



20.1 -58.306 -58.317 0.006 0.999 0.35(4) 0.36 

20.2 -58.880 -58.914 0.009 0.997 0.33(4) 0.32 

20.3 -59.293 -59.338 0.010 0.997 0.29(3) 0.30 

20.4 -59.068 -59.079 0.007 0.998 0.27(4) 0.27 

20.5 -51.525 -51.566 0.012 0.998 0.33(5) 0.33 

20.6 -53.359 -53.417 0.014 0.996 0.25(2) 0.26 



quences 20.1-20.4, while sequences 20.5 and 20.6 possess 
only 10 hydrophobic residues. In Ref. J}, the thermody- 
namic behavior of these sequences was studied by means 
of the simulated tempering method. For revealing low- 
temperature properties, an additional quenching proce- 
dure was performed. In our simulations, we used multi- 
canonical sampling over the entire range of temperatures 
without any additional quenching. 



1. Multicanonical sampling of heteropolymers with 20 
monomers 



As described in Sec. IIV Al the multicanonical method 
is capable to find even the lowest-energy states with- 
out any biasing or quenching. We proved this also 
for the 20mers of Table IIVI by comparing once more 
with the ELP minimization method. In Tables |Vl 
and IVII we present the estimates of the global energy 
minima in both models we found in these simulations. 
Once more, the values obtained with multicanonical 
sampling agree pretty well with those from ELP min- 
imization. The respective structural coincidences are 
confirmed by the values for the rmsd and the overlap 
also being given in these tables. In order to identify 
conformational transitions, we calculated the specific 
heat Cv{T) = {{E^) - {E}^)/kBT^ with {E^) = 

g{E)E'' eM-E/kBT)/ g{E) eM-E/ksT) 
from the density of states g{E). The density of states 
was found (up to an unimportant overall normalization 
constant) by reweighting the multicanonical energy 
distribution obtained with multicanonical sampling 
to the canonical distribution p^^^'^ (^E) at infinite 
temperature (/3 = l/keT = 0), since g{E) = pca„,oo(-£;)^ 
Figure 13 shows, as an example, the density of states 
g{E) (normalized to unity) and the multicanonical 
histogram for sequence 20.1 simulated with AB model 
II. We sampled conformations with energy values lying 
in the interval [—60.0,50.0], discretized in bins of size 

0. 01, and required the multicanonical histogram to 
be flat for at least 70% of the core of this interval, 

1. e., within the energy range [—43.5,33.5]. Within and 
partly beyond this region, we achieved almost perfect 
flatness, i.e., the ratios between the mean and maximal 
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FIG. 3: Density of states g{E) (normalized to unity over 
the plotted energy interval) and flat multicanonical histogram 
H{E) for sequence 20.1 (AB model II). 



histogram value, -ffmoan/-ffmax, as well as the ratio 
between minimum and mean, -ffmin/^^mcan, exceeded 
0.9. As a consequence of this high-accurate sampling 
of the energy space this enabled us to calculate the 
density of states very precisely over about 70 orders 
of magnitude. The energy scale is, of course, bounded 
from below by the ground-state energy -Emin, and that 
we closely approximated this value can be seen by the 
strong decrease of the logarithm of the density of states 
for the lowest energies. This strong decrease of the 
density of states curves near the ground-state energy is 
common to all short heteropolymer sequences studied. 
It reflects the isolated character of the ground state 
within the energy landscape. 

We used the density of states to calculate the specific 
heats of the 20mers in both models. The results are 
shown in Fig. 0] A first observation is that the specific 
heats obtained from simulations of AB model I show up 
two distinct peaks with the low-temperature peak located 
at Tp^-* and the high-temperature peak at T^^-* compiled 
in Table The AB model II, on the other hand, fa- 
vors only one pronounced peak at Tc with a long-range 
high-temperature tail. In Table IVll we compare our peak 
temperatures Tc with the values given in Ref. 7]. They 
are in good correspondence. 

The sequences considered here are very short and the 
native fold contains a single hydrophobic core. Interpret- 
ing the curves for the specific heats in Fig. 0] in terms 
of conformational transitions, we conclude that the het- 
eropolymers simulated with AB model I tend to form. 



(1) 



< T < T, 



(2) 



C ' 



within the temperature region T, 

termediate states (often also called traps) comparable 
with globules in the collapsed phase of polymers. For se- 
quences 20.5 and, in particular, 20.6 the smaller number 
of hydrophobic monomers causes a much sharper transi- 
tion at T^^ than at Tp^-* (where, in fact, the specific heat 
of sequence 20.6 possesses only a turning point). The 

(2) 

pronounced transition near is connected with a dra- 
matic change of the radius of gyration, as can be seen 
later in Fig. [T] indicating the collapse from stretched to 



highly compact conformations with decreasing temper- 
ature. The conformations dominant for high temper- 

(2) 

atures T > T^-, are random coils, while for tempera- 
tures T < primarily conformations with compact 
hydrophobic core are favored. The intermediary globu- 
lar "phase" is not at all present for the exemplified se- 
quences when modeled with AB model II, where only the 
latter two "phases" can be distinguished. We have to re- 
mark that what we denote "phases" are not phases in the 
strict thermodynamic sense, since for heteropolymers of 
the type we used in this study (this means we are not fo- 
cussed on sequences that have special symmetries, as for 
example diblock copolymers A„B„j), a thermodynamic 
limit is in principle nonsensical. Therefore conforma- 
tional transitions of heteropolymers are not true phase 
transitions. As a consequence, fluctuating quantities, for 
example the derivatives with respect to the temperature 
of the mean radius of gyration, d{Rgyr) / dT , and the mean 
end-to-end distance, d{Ree)/dT, do not indicate confor- 
mational activity at the same temperatures, as well as 
when compared with the specific heat. 

We conclude that conformational transitions of het- 
eropolymers happen within a certain interval of tempera- 
tures, not at a fixed critical temperature. This is a typical 
finite-size effect and, for this reason, the peak tempera- 
tures T^^^ and T^'' (for model I) and Tc (for model II) 
defined above for the specific heat are only representa- 
tives for the entire intervals. In order to make this more 
explicit, we consider sequence 20.3 in more detail. In 
Figs.EJa) (for AB model I) andE^b) (AB model II), we 
compare the energetic fluctuations (in form of the spe- 
cific heat Cv) with the respective fluctuations of radius 
of gyration and end-to-end distance, d{Roc,gyr)/dT = 
((i?ec,gyr£^) - {Rcc, gyr) {E)) / hgT^ Obviously, the tcm- 
peratures with maximal fluctuations are not identical and 
the shaded areas are spanned over the temperature inter- 
vals, where strongest activity is expected. We observe for 
this example that in model I (Fig.l^a)) two such centers 
of activity can be separated linked by an intermediary 
interval of globular traps. In fact, there is a minimum of 



the specific heat at T^"^ between T^^'' and T^"' , but the 
increase of internal energy by forming these states, which 

is given by dT Cv{T), is rather small and the glob- 

ules are not very stable. From Fig.|SJb), we conclude for 
this sequence that in model II no peculiar intermediary 
conformations occur, and the folding of the heteropoly- 
mer is a one-step process. 

It is widely believed and experimentally consolidated 
that realistic short single-domain proteins are usually 
two-state folders . This means, there is only one fold- 
ing transition and the protein is either in the folded or 
an unfolded (or denatured) state. Therefore, AB model 
II could indeed serve as a simple effective model for two- 
state heteropolymers. The main difference between both 
models under study is that model II contains an implicit 
torsional energy which is not present in model I. This 
is in correspondence with more knowledge-based Go-like 



.(2) 
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models with explicit torsional energy contributions for 
the study of small proteins with known typical two-state 
folding-unfolding kinetics [T^ . 



Nonetheless, there are also examples of small pep- 
tides exhibiting two clear peaks in the specific heat. In 
Ref. the artificial peptide AlaioGlysAlaio was stud- 
ied in detail and it turned out that two transitions sep- 
arate the ground-state conformation and random coil 
states. One is the alanine mediated helix-coil transition 
and the second the formation of a glycine hairpin that 
leads to a more compact conformation. 



2. Comparison with the homopolymer 

It is also interesting to compare the thermodynamic 
behavior of the heteropolymers with 20 monomers as de- 
scribed in the previous section with the homopolymer 
consisting of 20 A- type monomers, A2Q. This is the con- 
sequent off-lattice generalisation of self-avoiding interact- 
ing walks on the lattice (ISAW) that have been exten- 
sively studied over the past decades. In contrast to het- 
eropolymers, where, because of the associated sequence 
of finite length, a thermodynamic limit does not exist, ho- 
mopolymers show up a characteristic second-order phase 
transition between random coil conformations ( "good sol- 
vent") and compact globules ("poor solvent"), the so- 
called 9 transition [30l| . 

In this study, our interest was not focussed on the 
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FIG. 5: Fluctuations of energy (specific heat), radius of gy- 
ration and end-to-end distance for sequence 20.3 from simu- 
lations with (a) AB model I and (b) AB model II. 
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FIG. 6: Specific heats of the homopolymer A20 with 20 
monomers for both models. 



Q transition but more on the direct comparison of the 
finite-size homopolymer and the different heteropolymer 
sequences. In Fig. |B1 we have plotted the specific heats 
of this homopolymer for both models under study. The 
first observation is that, independent of the model, the 
collapse from random coils (high temperatures) to globu- 
lar conformations (low temperatures) happens, roughly, 
in one step. There is only one energetic barrier as indi- 
cated by the single peak of the specific heats. 

In Fig. Ul we have plotted for both models the mean 
radii of gyration as a function of the temperature for 
the sequences from Table Hvl in comparison with the ho- 
mopolymer. For all temperatures in the interval plotted. 
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FIG. 7: Mean radius of gyration {Rgyr) as a function of the 
temperature T for the sequences 20.1-20.4 (solid curves), 20.5, 
20.6 (long dashes) and the homopolymer A20 (short-dashed 
curve) for (a) AB model I and (b) AB model II. 



the homopolymer obviously takes more compact confor- 
mations than the heteropolymers, since its mean radius 
of gyration is always smaller. This different behavior is 
an indication for a rearrangement of the monomers that 
is particular for heteropolymers: the formation of the hy- 
drophobic core surrounded by the hydrophilic monomers. 
Since the homopolymer trivially also takes in the ground 
state a hydrophobic core conformation (since it only con- 
sists of hydrophobic monomers), which is obviously more 
compact than the complete conformations of the het- 
eropolymers, we conclude that hydrophobic monomers 
weaken the compactness of low-temperature conforma- 
tions. Thus, homopolymers and heteropolymers show 
a different "phase" behavior in the dense phase. Ho- 
mopolymers fold into globular conformations which are 
hydrophobic cores with maximum number of hydropho- 
bic contacts. Heteropolymers also form very compact hy- 
drophobic cores which are, of course, smaller than that 
of the homopolymer due to the smaller number of hy- 
drophobic monomers in the sequence. In total, however, 
heteropolymers are less compact than homopolymers be- 
cause the hydrophilic monomers are pushed off the core 
and arrange themselves in a shell around the hydrophobic 
core. For model I, we also see in Fig.|7Ja) a clear tendency 
that the mean radius of gyration and thus the compact- 
ness strongly depends on the hydrophobicity of the se- 
quence, i.e., the number of hydrophobic monomers. The 
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curves for sequences 20.5 and 20.6 (long-dashed curves) 
with 10 A's in the sequence can clearly be separated from 
the other heteropolymers in the study (with 14 hydropho- 
bic monomers) and the homopolymer. This supports the 
assumption that for heteropolymers the formation of a 
hydrophobic core is more favorable than the folding into 
an entire maximally compact conformation. 



V. SUMMARY 

We have investigated two coarse-grained off-lattice het- 
eropolymer models of AB type that mainly differ in 
the modelling of energetic bending and torsional barri- 
ers. While the original AB model (model I) ,6], which 
was first introduced for two-dimensional heteropolymers, 
treats the polymer as a stiff chain of hydrophobic (A) and 
hydrophilic {B) monomers without considering torsional 
barriers, model II favors bending and an additional con- 
tribution containing also torsional energy is regarded . 
Another noticeable property of model I is that contacts 
between A and B monomers are always suppressed, in 
contrast to model II. 

We studied these models by means of the multicanon- 
ical Monte Carlo method. In a first test, we checked 
the ability of the algorithm to find lowest-energy confor- 
mations. The results were compared with minimum en- 
ergy values obtained with the energy-landscape paving 
(ELP) algorithm by measuring the rmsd and a gener- 
alized overlap parameter that in addition to torsional 
degrees of freedom also allows the comparison of bond 
angles. We found very good coincidences for minimum 
energies and associated conformations for all sequences 
under study, with the exception of the 34mer in model 
II and the 55mer in both models. In the latter case the 
random walk in the energy space, which is considered as 
the system parameter, in not sufficient to find the global 
energy minimum and a more detailed study of the ori- 
gin of the free energy barriers, i.e., the identification of 
an appropriate order parameter, is required. Nonethe- 
less, for all sequences we obtained much lower values for 
the respective putative global-energy minimum than for- 
merly quoted in the literature using an off-lattice chain- 
growth algorithm |l6j| and also considerably lower values 
than with the annealing contour Monte Carlo (ACMC) 
method \vf\ . 

Our main objective was the comparison of the confor- 
mational transitions in both models. We primarily stud- 
ied energetic and conformational fluctuations of several 
sequences with 20 monomers and found that in model 
I there is a general tendency that, independent of the 
sequence, the folding from random coils (high tempera- 
tures) to lowest-energy conformations is a two-step pro- 
cess and the folding is slowed down by weakly stable in- 
termediary conformations. This is different in model II, 
were traps are avoided and the heteropolymers exhibit a 
two-state folding behavior. This is (qualitatively!) in 
correspondence with the observation that many short 



peptides seem to possess a rather smooth free-energy 
landscape, where only a single barrier separates unfolded 
states and native fold ■ Most of the previous investi- 
gations in this regard were performed by means of a kind 
of knowledge-based potentials, where topological prop- 
erties (e.g., the native contacts) of the folded state ex- 
plicitly enter. These potentials allow then quantitative 
studies of the dynamics of the folding process for the 
specified protein. In our study, however, we were more 
interested in the influence of basic principles on the fold- 
ing transition and therefore quantitative comparison with 
the folding of realistic proteins is not appropriate at this 
level of abstraction. 

In order to reveal folding properties specific to het- 
eropolymers, we also compared with a purely hydropho- 
bic polymer. Our results for the homopolymer are in ac- 
cordance with the widely accepted view of the formation 
of compact globular conformations below the O transi- 
tion temperature. The globules, which are in our in- 
terpretation only compact hydrophobic cores with max- 
imum number of hydrophobic contacts, minimize the 
surface exposed to the (implicit) solvent. This behav- 
ior was observed for both models which differ, for the 
purely hydrophobic homopolymer, only in local covalent 
bond properties, but not in the non-bonded interaction 
between the monomers. Since heteropolymers with the 
same chain length contain less hydrophobic monomers, 
the hydrophobic core is, although still very dense, smaller 
and the hydrophilic monomers form a shell surrounding 
the core and screening it from the solvent. This, in conse- 
quence, leads to a native conformation that qualitatively 
differs from the typical globules known from polymers. In 
models allowing for intermediary states (such as model I 
in our study), the globular "phase" is actually present for 
heteropolymers, too, and is dominant in a temperature 
interval between the hydrophobic-core phase at very low 
temperatures and the random coils that are present at 
high temperatures. 

In conclusion, simple, coarse-grained off-lattice het- 
eropolymer models without specific or knowledge-based 
parameterization are attractive as they allow for the 
study of how basic energetic contributions, bonded and 
non-bonded, in the model are related to and influence the 
conformational folding process of heteropolymers, respec- 
tively. Herein, the main focus is on the study of general 
properties of these systems. In perspective, such mod- 
els, after refinements with respect to a few specific inter- 
atomic interactions (e.g., hydrogen bonds) will be suit- 
able for studying heteropolymers of a few hundreds of 
monomers even quantitatively without the requirement 
of a prior knowledge of the final fold. 
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